Data visualization and data science, Hadley Wickham

https://www.youtube.com/watch?v=9YTNYT1maa4

Technologies / techniques / methods that Wickham introduces or discusses

  • Grammar of Graphics / ggplot2 / layered plotting

Wickham is the author of ggplot2, which implements a grammar of graphics approach to building visualizations (i.e. mapping data → aesthetics → geometric objects → faceting / layers). He emphasizes that using consistent grammar lets one build complex plots via composition of simple building blocks. This “layered / modular” approach is a central piece of his visualization philosophy.

  • Tidy data / data cleaning / structuring.

Before visualizing, Wickham insists on cleaning, structuring, and “tidying” data (so that each variable is in its own column, each observation in its own row, no ambiguity). This is part of the “data science pipeline” that he often describes (import → tidy → transform → visualize → model / communicate).

  • Story / message first, then design

Decide the message or story you want to communicate before designing the visual representation. The visual form should serve the message, not the other way around.

  • Design principles / visual perception

He discusses limitations of human visual perception and the need to consider what designs are perceptually effective (e.g. choosing color scales well, avoiding misleading mappings). He warns against blindly trusting default color ramps (e.g. rainbow) and urges more thoughtful design.

Wickham’s talk is about advocating a disciplined, principled approach to visualization using modern tools (R / tidyverse / ggplot2), grounded in human perception, reproducibility, and iterated design.

Main Points / Themes of the Talk

“Think about the data first”

Before choosing visuals, deeply understand your data: its structure, variables, noise, limitations. The visualization should be driven by the nature of the data and the question.

  • Decide on a message or story

A visualization is not just to show “everything”. it should focus on what you want the audience to see and take away. The narrative behind the data helps guide the design choices.

  • Clean, structure, and preprocess the data

Good visualizations depend on good data. Problems like missing values, inconsistent formats, or badly structured tables lead to misleading graphics.

  • Use reproducible code / document your pipeline

Visualizations should be produced in code to allow reproducibility, transparency, and easy modifications. Code should be readable, reproducible text.

Commentary & Critique

Wickham’s principles resonate strongly with good practices in modern data science. The emphasis on reproducibility, code-based graphics, and principled design is very timely in an era when many visualizations are made in point-and-click tools (with little transparency). Wickham’s keynote is more of a philosophical and methodological guide than a technical demo showcase. It reinforces the idea that good data visualization is a discipline combining data wrangling, programming, perception science, and storytelling. For practitioners, it is a useful reminder of what good practice looks like and what trade-offs to consider.